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European Patent Office . 
Erhaxdtstrasse 27 
D-80331 Miinchen 
Germany 



Dear Sirs 

Re: International Patent Application No. PCT/FI2004/000331 
Tietoenator Oyj 
Our ref : 800293WO 

In connection with filing of the Demand, we hereby submit amended claims and 
.respectfully present the following in response to the Writteri Opinion of the 
International Searching Authority issued on 13 October 2004. 

the independent claims 1, 21, 26 and 28 are hereby amended to define that a 
synonym acceptance criterion takes into account the number of identical 
characters; support for the amendment can be found, for example, on page 1, 
lines 5-7 and page 20, lines 17-20. Claim 1 has also been amended to state that 
the counterparts are searched for after determining synonym candidates. The 
enclosed dependent claims remain unchanged. 

The present invention addresses processing of data records containing data 
fields. In other words, the present invention handles stractured data. The present 
invention aims to increase the accuracy of fmding correct counterparts in a 
reference data set for processed data records. The search for the counterpart in 
the reference set typically takes into account a number of data fields in the data 
record. 

The meaninp o f the identifier , which a data field represents, is irrelevant for the 
present invention. The claimed invention is concerned about the value of the 
data field, for example, about the character string m the data field. The amended 
independent claims make this clear by stating that the synonym acceptance 
criterion takes into account the number of identical characters in the synonym 
candidate and in the value of the data field. 

In the claimed invention, there is a set of predetermined identifier values and a 
predefined synonym acceptance criterion. At least one synonym candidate is 
determined for a data field value from the set of predetermined identifier values 
and, if the data field value and the synonym candidate fulfill the synonym 
acceptance criterion, the synonym candidate and the data field value are 
associated as synonyms. 
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This means that infonnatioii about the synonyms is updated, if the synonym acceptance criterion 
is fulfilled, based on the current value of the data field. The update of the information about 
synonyms occurs before the search for the counterpart in the reference data set. As the synonym 
infonnation has been updated based on the value of a data field in the current data record, it is 
more liiely that a counterpart is found for the data record than when using static synonym 
information. 

The updating of the information about synonyms may be automated, as there exists the set of 
predefined identifier values and a predetermiaed synonym acceptance criterion is defmed. 

Dl (Sa Lin et al) discloses methods for overcoming problems relating to the use of databases 
when the user's terminology does not match the terminology used in the database. The solution 
is to describe the terms and the relationships between the ternis (that is, to construct an ontology 
model) and to expand user search terms using the ontology model. 

Ontology refers to the meaning of terms/words. The ontology model in Dl is typically built in 
advance. In Dl search terms are thus expanded before a search based on the meaning of the 
search terms using a (typically) static ontology model. 

The claimed invention is therefore different from the disclosure of Dl and thus novel in view of 
Dl. 

Furthermore, the aim of the method discussed in D l is different fiom the aim of the invention. In 
Dl, the aim is to expand the search terms for finding larger amounts of relevant information 
firom the available information. A person skilled in the art of processing structured information, 
when tying to increase the accuracy of finding counterparts for data records, would not consult 
publications relating to handling terms based on the meaning of the terms. As mentioned above, 
the meaning of a term (identifier) is irrelevant for the present uivention. 

In addition, should the skilled person have a look at Dl, he would notice that the ontology inodel 
in Dl is static or it is updated based oh user input. In Dl it is expressly said that attempts to 
completely automate update of the ontology model have not been very promising (page A3 3, left 
Golvmm, above Figure 1). • 

Thus Dl would not lead a skilled person into defming a synonym acceptance criterion and 
updatuag information about synonyms before searching counterparts for a data record. We thus 
find the claimed invention inventive in view of Dl. 

Regarding the other documents cited in the International Search Report, we would like to 
mention the following. 

WO01410Q2 relates to use of distributed databases. The user is allowed to make unstructured 
queries, and the query terms are generalized and/or expanded to return as many relevant words 
as possible to the user. Similarly as in Dl, also in this pubhcation tiie search terms are processed 
based on their meaning. 

The publication by Rodriguez and Varas (XP-002297455) discusses ontologies and database 
schemas, similarly as Dl. hi this paper there is no hint to updating a database schema.' 
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The publication by Lujan-Mora and Palomar addresses integration of data from different 
sources. The proposed solution is based on clustering: various variants of a term are replaced by 
the most frequently occurring variant. 

This pubhcation addresses handling of structured data, similarly as' the present invention. The 
solution to cope with various variants of a term is based on - the assumption that the most 
frequently occurring variant is correct. There is no explicit knowledge about the correct variant, 
in contrast to the present mvention where there is a set of predefined identifier values. 

Based on the above, we find the clamied invention new and inventive over the cited prior art. A 
reconsideration of the statement regarding novelty and inventive step in the Written Opinion of 
the International Searching Authority is therefore respectfiiUy requested. 

Should the Examiner, despite the enclosed amended claims and the arguments presented above, 
consider issuing a negative International Preliminary Examination report, the apphcant expects 
to receive a further Written Opinion pursuant to Rule 66.2 PCT. 

Please acknowledge receipt of this letter by retummg the top copy of the EPO Form 1037 
enclosed with this letter. 

Yours faithfiilly 



Sirpa Kuisma 
Professional Representative 



End. - replacement pages 23-27 

- courtesy copy of pages 23-27 showing amendments of the independent claims 
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Claims 

1. A method of processing a data record for finding a counterpart in a reference data 
set, the method comprising the steps of: 

5 determining in the data record a value of a data field, the data field representing an 

identifier, 

determining, from a . set of predetermined identifier values at least one synonym 
candidate for the value of the data field, 

determining if a synonym candidate and the value of the data field fulfill a 
10 predetermined synonym acceptance criterion taking into account the nvimber of 
identical characters in the synonym candidate and in the value of the data field, and if 
the predetermined synonym acceptance criterion is fulfilled, associating the value of 
the data field and the synonym candidate as synonyms, and 

searching for a counterpart for the data record by comparing to entries of the 
15 reference data set the value of the data field and/or a synonym associated with the 
value of the data field after determining whether the predetermined synonym 
acceptance criterion is fulfilled. 

2. A method as defmed in claim 1, wherein the at least one synonym candidate is 
20 determined using a candidate selection criterion depending at least on the value of the 

data field and on a synonym candidate. 

3. A method as defmed in claim 2, wherein the candidate selection criterion takes into 
account how similar a synonym candidate and the value of the data field sound. 
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4. A method as defined in claim 2, wherein the candidate selection criterion specifies 
that at least a predetermined part of the value cf the data field is identical to a 
predetermined part of a synonym candidate. 

5. A method as defmed in any one of claims 2 to 4, wherein the candidate selection 
criterion takes into account also a further data field of the data record, said further 
data field representing a second identifier. 



24 

6. A method as defined in any preceding claim, wherein at least one quality parameter 
is evaluated for a synonym candidate, the synonym acceptance criterion taking into 
accoimt the at least one quality parameter. 

5 . 7. A method as defined in claim 6, wherein at least one quaUty parameter takes into 
account at least one of the following quantities: 

a number of changes required for converting the value of the data field to be identical 
to a synonym candidate; a proportion of identical characters in the value of the data 
field and in a synonym candidate; and a difference between the length of the value of 
1 0 the data field and the length of a synonym candidate, 

8. A method as defined in claim 7, wherein the number of changes required for 
converting the value of the data field to be identical to a synonym candidate is 
calculated using the Levenshtein distance. 

15 

9. A method as defined in claim 7, wherein the proportion of identical characters takes 
into account the order of the characters. 

10. A method as defmed in any one of claims 6 to 9, wherein a first quality parameter 
20 is evaluated for each synonym candidate and at least a second quality parameter is 

evaluated, at least for the synonym candidate(s) having the best first quality parameter. 

11. A method as defmed in any one of claims 6 to 10, wherein the synonym 
acceptance criterion requires that there is only one synonym candidate having the best 

25 at least one quality parameter. 

12. A method as defined in any one of claims 6 to 11, wherein at least two quality 
parameters are evaluated for each synonym candidate and the synonym candidate 
acceptance criterion specifies a threshold for one of the at least two quality 

30 parameters, the threshold being dependent on a further one of the at least two quality 
parameters. 



13. A method as defmed in any preceding claim, wherein the. search for the 
counterpart involves comparison of the value of the data field to a synonym set 
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relating to. the identifier, members of said synonym set referring to respective 
predetermined identifier values, and when the predetermined synonym acceptance 
criterion is fiilfilled, the value of the data field is added to the synonym set as a 
member referring to the synonym associated with the value of the data field before the 
5 search for the counterpart.. 

14. A method as defined in any preceding claim, wherein determining the at least one 
. synonym candidate is discarded, if a predetermined discard criterion is fiilfiUed. 

10 15. A metiiod as defined in claim 14, wherein the predetermined discard criterion 
specifies that the value of the data field is identical to one of the predetermined 
identifier values. 

1.6. A method as defined in claim 14, wherein the search for the counterpart involves 
15 the synonym set and the predetermined discard criterion specifies that the value of the 
data field is at least one of the following: one of the predetermined identifier values, 
and a member of the synonym set. 

17. A method as defined in any one of claims 14 to 16, wherein the predetermined 
20 discard criterion takes into account a value of a second data field in the data record. 

1 8. A method as defined in any preceding claim, wherein information indicating the at 
least one synonym associated with the value of the data field is added to the data 
record. 

25 

19. A method as defined in claim 18, wherein a copy of the data record is made for 
each synonym associated with the value of the data field. 

20. A method as defined in any preceding claim, wherein the identifier relates to a 
30 name of one of the following: a geographical entity, a person and an organisation. 

21. A method of processing a synonym set for searching counterparts in a reference 
data set for data records, a data record containing a data field representing an 
identifier, members of the synonym set being fiLrst identifier values and referring to 
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respective second identifier values, the second identifier values being predetermined 
identifier values, and said searching for a counterpart involving comparison of a value 
of the data field to the synonym set, the method comprising the steps of determining 
among the predetermined identifier values at least one synonym candidate -relating to 
5 the value of the data field in the data record, and, if the value of the data field and a 
synonym candidate fulfill a predetermined synonym acceptance criterion taking into 
account the number of identical characters in the synonym candidate and in the value 
of the data field, adding before searching a counterpart for a data record- the value of 
the data jfield to the synonym set as a member referring to the synonym candidate. 

10 

22. A method as defined in claim 21, wherein the synonym set is empty before adding 
the value of the data field to the synonym set. 

23. A method as defined in claim 21, wherein the synonym set contains at least one 
1 5 member before adding the value of the data field to the synonym set. 

24. A computer program comprising program instructions for causing a computer to 
perform the method of any one of claims 1 to 23. 

20 25. A computer program as defined in. claim 24, embodied on a computer-readable 
record medixmi. 

26. A data processing system for processing data records for finding counterparts in a 
reference data set, the system comprisiag: 
25 - means for receiving data records, 

means for storing the reference data set, 

- means for storing predetenrdned identifier values for an identifier, 

- means for determining in the data records values of a data field, the data field 
representing the identifier, 

30 - means for associating values of the data field and respective predetermined 
identifier values as synonyms, said means configured to determine firom the 
predetermined identifier values at least one synonym candidate for a value of the 
data field, to determine if a synonym candidate and the value of the data field 
flilfill a predetermiaed synonym acceptance criterion taking into accoimt the 
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number of identical characters in the synonym candidate and in. the value of the 
data field, and if the predetermined synonym acceptance criterion is fulfilled, to 
associate the value of the data field and the synonym candidate as synonyms, and 

- means for searching coimterparts in the reference data set for the data records, said 
5 searching involving comparing to entries of the reference data set values of data 

fields and/or synonyms associated with the values of the data fields. 

27. A data processing system as defined in claim26, further comprising 

- means for storing a synonym set, members of said synonym set referring to 
10 respective predetermined identifier values, 

wherein the means for associating values of the data field and respective 
predetermined identifier values as synonyms are configured to add to the synonym set 
a member referring to the synonym associated with the value of the data field before 
activation of the means for searching counterparts. 
15 . 

28, A data processing system for processing a synonym set for searching coimterparts 
in a reference data set for data records, a data record comprising a data field 
representing an identifier, members of the synonym set being first identifier values 
and referring to respective second identifier values, said second identifier values being 

20. predetermined identifier values, and said searching involving comparing a value of 
the data field to the synonym set, the system comprising: 
means for storing the synonym set, 

- means for storing predetermined identifier values for the identifier, 
means for receiving data records, 

25 - means for deternaining in the data records values of the data field, and 

- means for adding to the synonym set a value of the data field and respective 
predetermined identifier values associated as synonyms before searching 
counterparts in the reference data set, said means configured to determine from the 
predetermined identifier values at least one synonym candidate for a value of the 

30 data field, to determine if a synonym candidate and the value of the data field 

fiilfill a predetermined synonym acceptance criterion taking into account the 
number of identical characters in the synonym candidate and in the' value of the 
data field, and if the predetermined synonym acceptance criterion is fulfilled, to 
associate the value of the data field and the synonym candidate as synonyms. 



